11/30/2018

University of Arkansas Geosciences Colloquia

Statistical Modeling

  • Drawing conclusions based on data while accounting for random variation including sampling and observational errors.

  • Goal: Make inference about the state of the world using data.

  • Most statisitics is taught as a recipe.

  • If your data are X then do Y.

  • Where is the creativity? Science?

Problem

  • Where is the science?

  • Don't we know something about the world other than our data is X?

  • How do we add this knowledge into our modeling?

Scientifically Motivated Statistical Modeling

  • Probability model.

    • Model encodes our understanding of the scientific process of interest.

    • Model accounts for as much uncertainty as possible.

    • Model results in a probability distribution.


  • Update model with data.

    • Use the model to generate parameter estimates given data.

Scientifically Motivated Statistical Modeling

  • Criticize the model

    • Does the model fit the data well?

    • Do the predictions make sense?

    • Are there subsets of the data that don't fit the model well?


  • Make inference using the model.

    • If the model fits the data, use the model fit for prediction or inference.

Probability Distributions

  • Start with probability distributions:

    • Data \(\mathbf{y}\).

    • Parameters \(\boldsymbol{\theta}\).

    • \([\mathbf{y}]\) is the probability distribution of \(\mathbf{y}\).

    • \([\mathbf{y} | \boldsymbol{\theta}]\) is the conditional probability distribution of \(\mathbf{y}\) given parameters \(\boldsymbol{\theta}\).

Example: linear regression

\[ \begin{align*} \left[y_i | \boldsymbol{\theta} \right] & \sim \operatorname{N}(X_i \beta, \sigma^2) \\ \boldsymbol{\theta} & = (\beta, \sigma^2) \end{align*} \]

Model Framework

  • Hierarchical model:

    • A model built in components.

    • Each component represents a different statistical goal.


  • Break the model into components:

    • Data Model.

    • Process Model.

    • Prior Model.


  • Combined, the data model, the process model, and the prior model define a posterior distribution.

Data Model

\[ {\huge \begin{align*} [\mathbf{z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}] & \propto \color{red}{[\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}]} [\mathbf{z} | \boldsymbol{\theta}_P] [\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P] \end{align*} }% \]

Data model

\[ {\huge \begin{align*} \color{red}{[\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}]} \end{align*} } \]

  • Describes how the data are collected and observed.
    • Account for measurement process and uncertainty.
    • Model the data in the manner in which they were collected.

  • Data \(\mathbf{y}\).
    • Noisy data.
    • Inexpensive data.
    • Not what you want to make inference on but close.

Data model

\[ {\huge \begin{align*} \color{red}{[\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}]} \end{align*} } \]

  • Latent variables \(\mathbf{z}\).
    • Think of \(\mathbf{z}\) as the ideal data.
    • No measurement error - the exact quantity you want to observe but can't.

  • Data model parameters \(\boldsymbol{\theta}_D\).

Data model: Examples

\[ {\huge \begin{align*} \color{red}{[\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}]} \end{align*} } \]

  • Age of minerals:

    • \(\mathbf{y}\) is the radio-date estimate.

    • \(\mathbf{z}\) is the true mineral age.

    • \(\theta_D\) is the radio-date standard error.

    • The probability distribution is determined by the measurement process.

Data model: Examples

\[ {\huge \begin{align*} \color{red}{[\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}]} \end{align*} } \]

  • Reconstructing climate from tree rings

    • \(\mathbf{y}\) is the tree ring width increment.

    • \(\mathbf{z}\) is the true, unobserved climate variable.

    • \(\boldsymbol{\theta}_D\) models the relationship between climate, stand dynamics, individual heterogeneity, tree age, (etc.) and tree ring width.

    • The probability distribution is determined by tree physiology, measurement uncertainty, etc.

Process Model

\[ {\huge \begin{align*} [\mathbf{z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}] & \propto [\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}] \color{blue}{[\mathbf{z} | \boldsymbol{\theta}_P]}[\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P] \end{align*} } \]

Process Model

\[ {\huge \begin{align*} \color{blue}{[\mathbf{z} | \boldsymbol{\theta}_P]} \end{align*} } \]

  • Where the science happens!

  • Latent process \(\mathbf{z}\) is modeled given data \(\mathbf{y}\).
    • Can be dynamic in space and/or time

  • Process parameters \(\boldsymbol{\theta}_P\).

  • Virtually all interesting scientific questions can be made with inference about \(\mathbf{z}\)

Process Model: Examples

\[ {\huge \begin{align*} \color{blue}{[\mathbf{z} | \boldsymbol{\theta}_P]} \end{align*} } \]

  • Sediment Mixing:
    • Model different mineral creation events.
    • Model mixing of sediments over time.
    • \(\mathbf{z}\) includes the true unobserved mineral age as well as the discrete mineral creation event.
    • \(\boldsymbol{\theta}_P\) includes the duration of the minearl creation event, the number of mineral creation events, and the relative mixing of rock to produce a sediment.

Process Model: Examples

\[ {\huge \begin{align*} \color{blue}{[\mathbf{z} | \boldsymbol{\theta}_P]} \end{align*} } \]

  • Reconstructing climate with tree rings

    • Trees of the same species share a similar response to climate.

    • Climate variables at sites nearby in location are closer to each other than sites far apart, on average.

    • Climate variables seperated by short periods of time are more similar than climate variables over long periods of time.

    • \(\mathbf{z}\) is the value of the unobserved climate variables.

    • \(\boldsymbol{\theta}_P\) are the species-specific growth responses and the correlation of climate across time and space.

Prior Model

\[ {\huge \begin{align*} [\mathbf{z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}] & \propto [\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}] [\mathbf{z} | \boldsymbol{\theta}_P] \color{orange}{[\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P]} \end{align*} } \]

Prior Model

Prior Model

  • Probability distributions define "reasonable" ranges for parameters.

  • Prior models are useful for a variety of problems:
    • Choosing important variables.
    • Preventing overfitting (regularization).
    • "Pooling" estimates across categories.

Posterior Distribution

\[ {\huge \begin{align*} \color{cyan}{[\mathbf{z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}]} & \propto [\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}] [\mathbf{z} | \boldsymbol{\theta}_P] [\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P] \end{align*} } \]

Posterior distribution

\[ {\huge \begin{align*} \color{cyan}{[\mathbf{z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}]} \end{align*} } \]

  • Probability distribution over all unknowns in the model.

  • Inference is made using the posterior distribution.

  • Because the posterior distribution is a probability distribution, uncertainty is easy to calculate.

Example: Climate change

  • Climate change is well understood globally.

  • Climate change is less well understood locally.

  • Need for spatially explicit reconstructions of climate variables.

  • Problem: data sources are messy and noisy.

Local Prediction

Introduction

Predicting the future by learning from the past

Predicting the future by learning from the past

  • Vegetation composition and structure change from ice age to current period.

  • Using change in temperature to predict future vegetation change.


Predicting the future by learning from the past

  • Classify compositional and structural change.

Predicting the future by learning from the past

  • Ordered multi-logistic B-spline regression.


  • Learn vegetation change in structure/composition given temperature change.


  • Forecast future vegetation change.



Predicting the future by learning from the past

  • Data model: Multi-logit distribution for ordered categories of observed change.

  • Process model: Assumes increasing temperature results in smooth changes of composition and struction.

  • Prior model: Not used.

Modeling Sediment mixing

Sediment Mixing

Sharman and Johnstone (2017). Sediment unmixing using detrital geochronology. Earth and Palenetary Science Letters.

Goals

  • Estimate proportion of each parent in a daughter.
    • Mixing model,

  • Reconstruct unobserved parent distributions given daughters.
    • Unmixing model.

Data

Mixing Model: Estimate proportion of each parent in a daughter.

Data Model

\[ {\huge \begin{align*} [\mathbf{z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}] & \propto \color{red}{[\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}]} [\mathbf{z} | \boldsymbol{\theta}_P] [\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P] \end{align*} }% \]

Data Model

  • Dating uncertainty.

  • \(y_{ib}\): age measurement on mineral \(i=1, \ldots, N_b\) for parent \(b = 1, \ldots, B\).

  • \(y_{id}\): age measurement for mineral \(i=1, \ldots, N\) of daughter \(d\).

  • Measurement error reported as standard deviation \(\sigma_{ib}\) (\(\sigma_{id}\)).

\[ \begin{align*} y_{ib} & \sim \color{red}{\operatorname{N}(z_{ib}, \sigma^2_{ib}).} \\ y_{id} & \sim \color{red}{\operatorname{N}(z_{id}, \sigma^2_{id}).} \end{align*} \]

Process Model

\[ {\huge \begin{align*} [\mathbf{z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}] & \propto [\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}] \color{blue}{[\mathbf{z} | \boldsymbol{\theta}_P]}[\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P] \end{align*} } \]

Process Model


Assumptions:


  • An unknown number of mineral creation events that are relatively discrete in geologic time.

  • Each parent is an mixture of minerals from creation events.

  • Each daughter is a mixture of parents.

Process Model: Parent Distribution

\[ \begin{align*} {z}_{ib} \sim \color{blue}{\sum_{k=1}^K p_{bk} \operatorname{N}(\mu_k, \sigma^2_k).} \end{align*} \]

drawing drawing drawing

Process Model - Mixing Model for daughter

  • Daughter is a mixture of parents.

\[ \begin{align*} z_{id} & \sim \color{blue}{\sum_{b=1}^B \phi_b \sum_{k=1}^K p_{bk} \operatorname{N}(\mu_k, \sigma^2_k).} \end{align*} \]

  • \(\phi_b\) is the proportion of daughter sediments from parent \(b\).

Process Model - Mixing Model

\[ \begin{align*} \phi_1 = 0.200 \quad\quad\,\,\, \phi_2 = 0.532 \quad\quad\,\,\,\,\, \phi_3 = 0.268 \,\,\quad\quad\quad \mbox{Daughter} \end{align*} \]

drawing drawing drawing drawing

Prior Model

\[ {\huge \begin{align*} [\mathbf{z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}] & \propto [\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}] [\mathbf{z} | \boldsymbol{\theta}_P]\color{orange}{[\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P]} \end{align*} } \]

Prior Model for the number of creation events

  • For each parent \(b\), the proportion of creation events is \(\mathbf{p}_b\) where \(p_{bk} > 0\) and \(\sum_{k=1}^K p_{bk} = 1\).

  • Most of the \(p_{bk}\)s are 0 (reasonable in real world).

  • Unkown number of mineral creation events.

Dirichlet Process

  • Assigns observations to clusters.

  • Model for \(\mathbf{p}_b\).

  • Number of clusters increases with number of observations.

Dirichlet Process

Dirichlet Process

Many Creation Events
drawing drawing drawing

Dirichlet Process

Few Creation Events
drawing drawing drawing

Simulation Study: Mixing model

Simulation Study: Mixing

drawing

Simulation Study: Mixing

drawing

Mixing Estimates

drawing drawing

Unmixing Model: Estimate unobserved parent distributions.

Unmixing

drawing

Unmixing Model


  • \(\color{red}{\mbox{Data model:}}\)

\[ \begin{align*} y_{id} & \sim \color{red}{\operatorname{N}(z_{id}, \sigma^2_{id}).} \end{align*} \]


  • \(\color{blue}{\mbox{Process model:}}\)

\[ \begin{align*} z_{id} & \sim \color{blue}{\sum_{b=1}^B \phi_{db} \sum_{k=1}^K p_{bk} \operatorname{N}(\mu_k, \sigma^2_k).} \end{align*} \]


  • \(\color{orange}{\mbox{Prior model:}}\)

    • Same Dirichlet Process clustering model as mixing model.

Simulation Study: Unmixing

drawing

Simulation Study: Unmixing

drawing

Unmixing the data

Unmixing Reconstructions

drawing drawing

Benefits:


  • Dirichlet process model can be used to describe sediment mixing processes.

  • Estimation of mixing and reconstruction with uncertainty.

  • Can ask questions like: what is probability at least 50% of sediment from daughter \(d\) is from parent \(b\):

\[ \begin{align*} \sum_{k=1}^K I\{ \phi_b^{(k)} > 0.5 \}. \end{align*} \]

Future extensions:

  • Account for spatial correlation among daughters.

  • Account for temporal correlation within a sediment core.

Learning about the past: Climate Proxy Data

Climate proxy data

  • Many ecological and physical processes respond to climate over different time scales.
    • Tree rings, corals, forest landscapes, ice rings, lake levels, etc.


  • These processes are called climate proxies.
    • They are proxy measurements for unobserved climate.
    • Noisy and messy.
    • Respond to a wide variety of non-climatic signals.

PalEON

drawing

PalEON

drawing drawing

Forward Model

drawing

Data.

Data Model.

Process Model.

Inverse Model

drawing

Data.

Data Model.

Process Model.

The Data

drawing

Data Model

\[ {\huge \begin{align*} [\mathbf{z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}] & \propto \color{red}{[\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}]} [\mathbf{z} | \boldsymbol{\theta}_P] [\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P] \end{align*} }% \]

Climate Data Model

\[ \begin{align*} \begin{pmatrix} \mathbf{T}_{t} \\ \mathbf{P}_{t} \end{pmatrix} = \begin{pmatrix} \mathbf{T}_{\mbox{Jan}, t} \\ \vdots \\ \mathbf{T}_{\mbox{Dec}, t} \\ \mathbf{P}_{\mbox{Jan}, t} \\ \vdots \\ \mathbf{P}_{\mbox{Dec}, t} \\ \end{pmatrix} \sim \color{red}{\mbox{N} \left( \mathbf{A} \begin{pmatrix} \mathbf{T}_{t-1} \\ \mathbf{P}_{t-1} \end{pmatrix}, \boldsymbol{\Sigma} \right).} \end{align*} \]


  • Temperature and precipitation in sequential months (years) are more related to each other than months (years) far apart.

Tree Ring Data Model

\[ \begin{align*} y_{i t j} & \sim \color{red}{\begin{cases} \mbox{N}\left(\beta_{0_j} + \beta_{1_j} f^{VS}\left(\mathbf{w}_t, \boldsymbol{\theta}^{VS}_j\right), \sigma^2_j \right) & \mbox{if } z_j = 0,\\ \mbox{N}\left(\tilde{\beta}_{0_j} + \tilde{\beta}_{1_j} f^{Pro}\left(\mathbf{w}_t, \boldsymbol{\theta}^{Pro}_j\right), \tilde{\sigma}^2_j \right) & \mbox{if } z_j = 1. \\ \end{cases}} \end{align*} \]


  • Regress observed tree ring \(y_{itj}\) onto simulated tree rings \(f^{VS}\left(\mathbf{w}_t, \boldsymbol{\theta}^{VS}_j\right)\) and \(f^{Pro}\left(\mathbf{w}_t, \boldsymbol{\theta}^{Pro}_j\right)\).

  • \(z_j\) - species specific growth model form (VS or Pro).

  • Model chooses best growth model form for each species.

Process Model

\[ {\huge \begin{align*} [\mathbf{z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}] & \propto [\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}] \color{blue}{[\mathbf{z} | \boldsymbol{\theta}_P]}[\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P] \end{align*} } \]

Tree Ring Growth Model

drawing

\[ \begin{align*} f^{\ell}\left(\mathbf{w}_t, \boldsymbol{\theta}^{\ell}_j\right) & = \color{blue}{ \sum_{s=1}^{12} \chi_s \mbox{min} \left( g^{\ell}\left( T_{t,s}, \boldsymbol{\theta}^{\ell}_j \right), g^{\ell}\left( P_{t, s}, \boldsymbol{\theta}^{\ell}_j \right) \right),} \\ & \ell = VS \mbox{ or } Pro. \end{align*} \]

Tree Ring Growth Model

The Inverse Problem

Prior Model

\[ {\huge \begin{align*} [\mathbf{z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}] & \propto [\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}] [\mathbf{z} | \boldsymbol{\theta}_P]\color{orange}{[\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P]} \end{align*} } \]

Prior model

  • Assumes growth responses follow "ecological niche."

  • Tree species that grow in the Hudson Valley respond to similar climate so have similar responses.

  • Variations from common response are to exploit an "ecological niche" that allows many species to exist on the same landscape.

Ecological Niche

drawing

Simulation Study

Simulation Study

drawing

Why is the temperature reconstruction poor?

drawing

Reconstruction

drawing

Reconstruction

drawing

Pollen Data

Pollen Data

Pollen Data

Fossil Pollen Data

Data Model

\[ {\huge \begin{align*} [\mathbf{z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}] & \propto \color{red}{[\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}]} [\mathbf{z} | \boldsymbol{\theta}_P] [\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P] \end{align*} } \]

Data Model

\[ \begin{align*} \mathbf{y}\left( \mathbf{s}_i, t \right) & \sim \color{red}{\operatorname{Dirichlet-Multinomial} \left( N, \exp \left(\mathbf{z}\left( \mathbf{s}_i, t \right) \boldsymbol{\beta} \right)\right)} \end{align*} \]


  • Researchers take 1cm\(^3\) cubes sediment samples along the length of a sediment core from a lake.

  • Raw data are counts of each species \(y(\mathbf{s}_i, t)\) at site \(\mathbf{s}_i\) for time \(t\).

  • In each cube, researcher counts the first \(N\) pollen grains and identifies to species.

  • Climate variable \(\mathbf{z}(\mathbf{s}_i, t)\) at site \(\mathbf{s}_i\) for time \(t\).
    • Only known for time \(t=1\).

Data Model

Non-linear Data Model

  • Vegetation response to climate is non-linear.

  • Pollen are "aggregated" into groups across space and taxonomy.

    • "Niche" responses in the groups can produce multi-modal responses.

\[ \begin{align*} \mathbf{y}\left( \mathbf{s}_i, t \right) & \sim \color{red}{\operatorname{Dirichlet-Multinomial} \left( N, \exp\left( \mathbf{B} \left( \mathbf{z}\left( \mathbf{s}_i, t \right) \right) \boldsymbol{\beta} \right) \right)} \end{align*} \]


  • \(\mathbf{B} \left( \mathbf{z}\left( \mathbf{s}_i, t \right) \right)\) is a basis expansion of the covariates \(\mathbf{z}\left( \mathbf{s}_i, t \right)\).
    • Use B-splines or Gaussian Processes as a basis.
    • \(\mathbf{B} \left( \mathbf{z}\left( \mathbf{s}_i, t \right) \right)\) is random.
    • Computationally challenging.


  • For \(t \neq 1\), the \(\mathbf{z} \left( \mathbf{s}_i, t \right)\)s are unobserved.


Non-linear Calibration Model

Process Model

\[ {\huge \begin{align*} [\mathbf{z}, \boldsymbol{\theta}_D, \boldsymbol{\theta}_P | \mathbf{y}] & \propto [\mathbf{y} | \boldsymbol{\theta}_D, \mathbf{z}] \color{blue}{[\mathbf{z} | \boldsymbol{\theta}_P]}[\boldsymbol{\theta}_D] [\boldsymbol{\theta}_P] \end{align*} } \]

Process Model

Dynamic Model

  • We are interested in estimating the latent process \(\mathbf{z} \left( \mathbf{s}, t \right)\).

\[\ \begin{align*} \color{blue}{\mathbf{z} \left(t \right) - \mathbf{X} \left( t \right) \boldsymbol{\gamma}} & \sim \color{blue}{\operatorname{N}\left(\mathbf{M}\left(t\right) \left( \mathbf{z} \left(t-1 \right) - \mathbf{X} \left( t \right) \boldsymbol{\gamma} \right), \boldsymbol{R}\left( \boldsymbol{\theta} \right) \right)} \end{align*} \]


  • Assumes climate states nearby in time are more correlated than those further apart in time.

  • \(\mathbf{X} \left(t \right) \boldsymbol{\gamma}\) are the fixed effects from covariates like latitude, elevation, etc.

Elevation covariates

Scaling the process for big data

  • Define a set of spatial knot locations \(\mathbf{s}^{\star} = \left\{ \mathbf{s}_1^{\star}, \ldots, \mathbf{s}_m^{\star} \right\}\).

  • \(\boldsymbol{\eta}^{\star} \left( t \right) \sim \operatorname{N} \left( \mathbf{0}, \mathbf{R}^{\star}\left( \boldsymbol{\theta} \right) \right)\).

  • \(\mathbf{R}^{\star}\left( \boldsymbol{\theta} \right)\) is the spatial covariance defined at the knot locations \(\mathbf{s}^{\star}\).

  • The linear interpolator from observed location \(\mathbf{s}_i\) to knot location \(\mathbf{s}_j^{\star}\) is \(\mathbf{r} \left(\mathbf{s}_i, \mathbf{s}_j^{\star} \right) \mathbf{R}^{\star}\left( \boldsymbol{\theta} \right)^{-1}\) where \(\mathbf{r} \left(\mathbf{s}_i, \mathbf{s}_j^{\star} \right) = \operatorname{Cov} \left(\mathbf{s}_i, \mathbf{s}_j^{\star} \right)\)

Banerjee S, Gelfand AE, Finley AO and Sang H (2008). "Gaussian, predictive process models for large spatial data sets." Journal, of the Royal Statistical Society: Series B (Statistical, Methodology), 70(4), pp. 825-848.

Predictive Process

  • \(\boldsymbol{\eta} \left( t \right) \approx \mathbf{r} \left(\mathbf{s}, \mathbf{s}^{\star} \right) \mathbf{R}^{\star}\left( \boldsymbol{\theta} \right)^{-1} \tilde{\boldsymbol{\eta}} \left( t \right)\).

  • The predictive process can be shown to be the optimal predictor of the parent process \(\boldsymbol{\eta} \left( t \right)\) of dimension \(m\)

  • The dynamic climate process becomes

\(\begin{align*} \mathbf{z} \left(t \right) - \mathbf{X} \left( t \right) \boldsymbol{\gamma} & \approx \mathbf{M}\left(t\right) \left( \mathbf{z} \left(t-1 \right) - \mathbf{X} \left( t \right) \boldsymbol{\gamma} \right) + \mathbf{r} \left(\mathbf{s}, \mathbf{s}^{\star} \right) \mathbf{R}^{\star}\left( \boldsymbol{\theta} \right)^{-1} \boldsymbol{\tilde{\eta}} \left(t \right) \end{align*}\)

Time Uncertainty

  • Each fossil pollen observation includes estimates of time uncertainty.
    • The time of the observation is uncertain.
    • Weight the likelihoods according to age-depth model.
    • Posterior distribution of ages.


  • For each observation fossil pollen observation an age-depth model gives a posterior distribution over dates.
    • Define \(\omega \left(\mathbf{s}_i, t \right)\) as P(age \(\in (t-1, t)\)).
    • \([\mathbf{y} \left( \mathbf{s}_i, t \right) | \boldsymbol{\alpha} \left( \mathbf{s}_i, t \right) ] = \prod_{t=1}^T [\mathbf{y} \left( \mathbf{s}_i, t \right) | \boldsymbol{\alpha} \left( \mathbf{s}, t \right)]^{\omega_\left(\mathbf{s}_i, t \right)}\).

Implementation

gitHub package

devtools::install_github("jtipton25/BayesComposition")


  • Includes options for multiple models including:
    • mixture models.
    • different likelihoods and link functions.
    • correlations in functional response.


  • Code in C++ using Rcpp package for computation speed.

Simulation Study

Simuated data

Simulated Reconstruction

Simulated Reconstruction Temporal Trend

Reconstruction

Reconstruction

Reconstruction over time

Reconstruction Inference

  • Current methods are site-level "transfer function" methods.
    • These methods ignore elevation, temporal autocorrelation, and spatial autocorrelation.
    • Sensitive to the data.
    • Poor quantification of uncertainty.
    • Unclear how to choose among models.
  • The spatial method is statistically principled.
    • Has higher power.
    • Smaller uncertainties that change with data (sample size, signal coherence, etc.).
    • Can use model selection methods (information criterion, etc).

Reconstruction Inference

Reconstruction Inference

Conclusion

Conclusion

Model framework opens the door to answering meaningful questions:

  • Do pollen distributions change with elevation?
    • Covariate-sensitive parameterizations.
  • Do pollen distributions change over space or time?
    • Regression coefficients vary over space/time.
  • How to combine multiple proxies (tree rings, pollen, etc)?
    • Each proxy gets its own data model.
    • Proxies link to dynamic space-time process.

Conclusion

It is possible to put the science in your statistics.

  • Takes some careful thinking and learning.

  • Opens the door to more powerful analyses.

  • More flexibility in the questions that can be answered.

Thank You

Trend Comparison